Demonstrating the Feasibility of Line Intensity Mapping Using Mock Data of Galaxy 

Clustering from Simulations 
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Visbal & Loeb (2010) have shown that it is possible to measure the clustering of galaxies by 
cross correlating the cumulative emission from two different spectral lines which originate at the 
same redshift. Through this cross correlation, one can study galaxies which are too faint to be 
individually resolved. This technique, known as intensity mapping, is a promising probe of the 
global properties of high redshift galaxies. Here, we test the feasibility of such measurements with 
synthetic data generated from cosmological dark matter simulations. We use a simple prescription 
for associating galaxies with dark matter halos and create a realization of emitted radiation as a 
function of angular position and wavelength over a patch of the sky. This is then used to create 
synthetic data for two different hypothetical instruments, one aboard the Space Infrared Telescope 
for Cosmology and Astrophysics (SPICA) and another consisting of a pair of ground based radio 
telescopes designed to measure the CO(l-O) and CO(2-l) emission lines. We find that the line cross 
power spectrum can be measured accurately from the synthetic data with errors consistent with the 
analytical prediction of Visbal & Loeb (2010). Removal of astronomical backgrounds and masking 
bright line emission from foreground contaminating galaxies do not prevent accurate cross power 
spectrum measurements. 



I. INTRODUCTION 

Recently, Visbal & Loeb (2010) d suggested a new 
technique for statistically observing the clustering of faint 
galaxies through intensity mapping of multiple atomic 
and molecular lines (see also |2rlj|)- This method can 
probe galaxies which are too faint to be seen individu- 
ally, but which contribute significantly to the cumulative 
emission due to their large numbers. 

Atoms and molecules in the interstellar medium of 
galaxies produce line emission at particular rest frame 
wavelengths For galaxies at cosmological distances, 
these wavelengths are redshifted by a factor of (1 + z) 
due to cosmic expansion. Thus, for emission in a par- 
ticular spectral line, the observed angular position and 
the observed wavelength correspond to a 3D spatial loca- 
tion. With observational data which includes both spec- 
tral and spatial information, one can then measure the 
three dimensional clustering of galaxies. 

Before line emission can be associated with a particular 
location in space, one must separate it from spectrally ex- 
tended emission. Galactic continuum emission and spec- 
trally smooth astrophysical foregrounds and backgrounds 
(e.g., the Cosmic Microwave Background or galactic dust 
emission) can be removed by fitting smooth functions of 
frequency to data and subtracting them away; this has 
been discussed extensively in the context of cosmolog- 
ical 21cm observations [q-lllj. After background emis- 
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sion is removed one still needs to avoid possible confusion 
with other emission lines. For multiple lines of different 
rest frame wavelengths the intensity at a particular ob- 
served wavelength corresponds to emission from multiple 
redshifts, one for each emission line. With both spa- 
tial and spectral information, the total emission over a 
small range in observed wavelength corresponds to a su- 
perposition of the 3D distribution of galaxies at different 
redshifts. 

Fortunately, it is possible to statistically isolate the 
fluctuations from a particular redshift by cross correlat- 
ing the emission in two different lines [l[ . If one compares 
the fluctuations at two different wavelengths, which cor- 
respond to the same redshift for two different emission 
lines, the fluctuations will be strongly correlated. How- 
ever, the signal from any other lines arises from galaxies 
at different redshifts which are very far apart and thus 
will have much weaker correlation (see Figure [TJ. In this 
way, one can measure either the two-point correlation 
function or power spectrum of galaxies at some target 
redshift weighted by the total emission in the spectral 
lines being cross correlated. 

We emphasize that one can measure the line cross 
power spectrum from galaxies which are too faint to be 
seen individually over detector noise. Hence, a measure- 
ment of the line cross power spectrum can provide in- 
formation about the total line emission from all of the 
galaxies which are too faint to be directly detected. One 
possible application of this technique would be to mea- 
sure the evolution of line emission over cosmic time to 
better understand galaxy evolution and the sources that 
reionized the Universe. Changes in the minimum mass of 
galaxies due to photoionization heating of the intergalac- 
tic medium during reionization could also potentially be 
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FIG. 1: A slice from our simulated realization of line emission from galaxies at an observed wavelength of 441/jm (left) and 
364/im (right). The slice is in the plane of the sky and spans 250 x 250 comoving Mpc 2 with a depth of Av/v = 0.001. The 
colored squares indicate pixels in our SPICA example (presented below) which have line emission greater than 200Jy/Sr for 
the left panel and 250Jy/Sr for the right panel. The emission from OI(63/im) and OIII(52/im) is shown in red on the left and 
right panels, respectively, originating from the same galaxies at z = 6. All of the other lines in Table U are included and plotted 
in blue. Cross correlating data at these two observed wavelengths would reveal the emission in OI and OIII from z = 6 with 
the other emission lines being essentially uncorrelated. 



measured 

Here we use cosmological simulations to test the fea- 
sibility of measuring the galaxy line cross power spec- 
trum. We create synthetic data sets for two hypotheti- 
cal instruments, one on the Space Infrared Telescope for 
Cosmology an Astrophysics (SPICA) and the other con- 
sisting of a pair of ground based radio telescopes opti- 
mized to measure CO (1-0) and CO (2-1) emission from 
high redshifts. We test how well the cross power spec- 
trum can be measured and find agreement with the an- 
alytical expectation derived in [l|. However there are 
some additional complications. Small fc-modes along the 
line of sight which are contaminated during the fore- 
ground removal process must be discarded, increasing 
the statistical uncertainty on large spatial scales. Addi- 
tionally, when masking out contaminating emission lines 
from bright foreground galaxies one must be careful not 
to introduce a spurious correlation between the data sets 
being cross correlated. 

The paper is organized as follows. In §2 we describe the 
methods used in this paper. This includes a brief review 
of the galaxy line cross power spectrum, a description of 
the synthetic data sets, the details of the simulations, and 
a discussion of the steps involved in measuring the cross 
power spectrum. In §3 and §4 we present our results for 
the SPICA example and the CO (1-0) and CO(2-l) tele- 
scopes, respectively. Finally, we discuss and summarize 
our conclusions in §5. Throughout, we assume a ACDM 
cosmology with fl A = 0.73, fi m = 0.27, fl b = 0.045, 
h = 0.7, n s = 0.96 and cr 8 = 0.8, [H[. 



II. METHOD 

A. Galaxy line cross power spectrum 

First, we briefly review the galaxy line cross power 
spectrum. For a more complete discussion, see Visbal 
& Loeb (2010) 1]. We assume that emission is mea- 
sured both as a function of angle on the sky and ob- 
served wavelength. If one fits a smooth function of wave- 
length along each direction on the sky and subtracts it 
from the data, one obtains the fluctuations from the 
average signal as a function of angle and wavelength: 
AS(9 1 ,d 2 ,v) = S(0i,9 2 ,i/) - S. There is a one to one 
correspondence between angular position and wavelength 
and spatial position for emission in a particular line. For 
convenience we use comoving coordinates at the location 
of the target galaxies instead of angle and wavelength. 
The fluctuations at a particular location results from a 
number of different sources, 

ASi — ASlinel + AiSnoiso + ASbadlincl + A5badline2 + • • • 

which include contributions from the target galaxies we 
wish to cross correlate, detector noise, and emission in 
different lines from galaxies at different redshifts which 
we refer to as "bad line" emission. One can cross corre- 
late the fluctuations in two different lines from the same 
galaxies. We define the line cross correlation function as, 

£i |2 (r) = (AS!(r ,x)AS 2 (r ,r + x)), (2) 

where subscripts denote different lines being cross cor- 
related. The center of the survey volume is denoted by 
r G , x is the distance from the center in the first set of 
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fluctuations, and r + x is the distance from the center in 
the second set of fluctuations. 

Because the noise fluctuations in the two different data 
sets are uncorrelated and galaxies seen in different bad 
lines will have very large separations and thus be essen- 
tially uncorrelated we are only left with contributions 
from the target galaxies. On large scales we can make 
the assumption that line fluctuations due to galaxy clus- 
tering are given by ASund = Sib5(r), where Si is the 
average target line signal, b is the luminosity weighted 
average galaxy bias, and <5(r) is the cosmological over- 
density at a location r. It follows that, 

£i, 2 (r) = (AS , ii n ci(r ,x)A5n nc2 (r ,r + x)) 

= SiS 2 b 2 (6(x)5(r + x)> = SiS 2 b 2 ^), (3) 

where £(r) is the cosmological matter correlation func- 
tion and the subscript numbers denote the different lines 
being cross correlated. 

The line cross power spectrum is then defined as the 
Fourier transform, 

P x , 2 (k) = J d 3 ra, 2 (r)e* k r = SiS 2 b 2 P(k) + P shot , (4) 

where P s hot is the shot-noise power spectrum due to the 
discrete nature of galaxies. 

An unbiased estimator for the cross power spectrum 
is given by the product of the Fourier transforms of the 
data sets, 

A ia =y(/W + / k a) V k CT ), (5) 

where V is the volume of the survey and the superscripts 
denote the different lines being cross correlated. The 
Fourier amplitude is given by, 

/ k = y d 3 rA5(r ,r)PF(r)e ik r . (6) 

Here W(r) is a window function that is constant over 
the survey volume and zero at all other locations. It is 
normalized such that, J W(r)cPr = 1. 

The root mean square (RMS) error in a measurement 
of the cross power spectrum at one particular fc-value is 
given by [l|, 

$Pl,2 = 2^1>2 ^ltotalPztotal ), (7) 

where Pitotai and P 2 totai are the total power spectrum 
corresponding to the first line and second line being cross 
correlated. Each of these includes a term for the power 
spectrum for each of the bad lines, the target line, and 
detector noise (see Appendix A of Ref. [1[). When av- 
eraging nearby values of the power spectrum this error 
goes down by a factor of V^modesi where N modcs is the 
number of statistically independent k- values at which the 
power spectrum is measured. 



B. Synthetic data set 

In order to test the feasibility of measuring the line 
cross power spectrum we create synthetic data sets for 
instruments measuring both spatial and spectral infor- 
mation. Our goal is to produce a realization of the light 
from all galaxies as a function of angular position and 
observed wavelength on a patch of the sky. We create 
these data with a cosmological dark matter simulation 
(described in detail below) . From the simulation we con- 
struct a light cone which has the distribution of dark 
matter halos which would be observed today in the vol- 
ume corresponding to an angular patch on the sky out 
to a redshift of z = 10. 

A simple prescription is used to associate galaxies with 
the dark matter halos from our simulation. We assign 
each galaxy a spectrum and assume that its intensity 
scales with star formation rate (SFR). The SFR versus 
halo mass relation is determined by matching comoving 
density with observed UV luminosity functions [l3l - [l7| . 

We assume that galaxies are found in dark matter halos 
above a minimum mass, Af m j n . After reionization M m i n 
represents the threshold for assembling heated gas out 
of the photo-ionized intergalactic medium, corresponding 
to a minimum virial temperature of ~ 10 5 K [18j . We 
assume that reionization was completed by a redshift of 
z si 10. In all of the examples presented below, M m - ln is 
set to correspond to this post-reionization requirement. 

The larger dark matter halos in our simulation may 
host multiple galaxies. To incorporate this effect in our 
synthetic data we have used a simple prescription for 
the halo occupation distribution. Following Ref. [l9| 
for the distribution of dark matter sub-halos, we con- 
sider two different types of galaxies: central and satellite. 
We assume that the distribution of central galaxies is a 
step function: above Mmin we assume each halo has one 
galaxy at its center. We then assume that there are a 
number of satellite galaxies given by a Poisson distribu- 
tion with a mean of iV sa t = (M/Mi) , here /3 = 1, and 
Mi = 30M min atz = 0-0.5; Mi = 20M min at z = 0.5-2; 
and Mi = 10M mm at z > 2. We distribute these galaxies 
randomly, but weighted by an NFW profile, throughout 
the larger host dark matter halo. We treat the central 
and satellite galaxies as independent in assigning star for- 
mation rates to them as explained below. We associate 
half of the total halo mass to the central galaxy halos 
and split the remainder of the mass equally to all of the 
satellite galaxy halos. 

After relating galaxies to dark matter halos in the sim- 
ulation we produce a spectrum for each galaxy. For the 
continuum, we take the measured spectral energy distri- 
bution of M82 and scale it with the SFR [H . The results 
are insensitive to the particular choice of galaxy contin- 
uum, as it is removed in the fitting and subtraction stage 
of the data analysis, as discussed below. 

In order to estimate the amplitude of line emission fluc- 
tuations we assume a linear relationship between line lu- 
minosity, L, and star formation rate, M*, L = M* x i?, 
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where R is the ratio between SFR and line luminosity for 
a particular line. This is similar to existing relations in 
different bands (see Ref. and was used in the past 
to estimate the strength of the galactic lines we consider 
[22| . The values for relevant lines are shown in Table 
HI For the first 7 lines, we use the same ratios, R, as 
in Ref. [22| which were calculated by taking the geomet- 
ric average of the ratios from an observational sample of 
lower redshift galaxies [23[ . The other lines have been cal- 
ibrated based on the galaxy M82 [24| ■ We assign a width 
to the lines based on the circular virial velocity of the 
dark matter halos, but the results are mostly insensitive 
to this choice for the spectral resolutions we consider in 
our examples. This is because the majority of the signal 
comes from lines which are spectrally unresolved. 

For the results presented below we have made the sim- 
plification that all galaxies have the same R value for 
each emission line. Even if there is random scatter in 
the R values in each galaxy, the line cross power spec- 
trum will remain unchanged. This scatter will behave 
essentially like detector noise with intensity that is non- 
uniform across the data cube. 

We use observed UV luminosity functions f juj-[l7j|) of 
galaxies to calibrate the SFR assigned to dark matter 
halos with an abundance matching technique. Given the 
observed luminosity functions, we determine the num- 
ber density of galaxies as a function of SFR through the 
relation, 

L uv = L\ ( M * ] ergs/s/Hz, (8) 
\M Q yr L J 

where L\ is given by La = 8 x 10 27 at a rest frame 
wavelength of A = 1500A. This assumes a Salpeter 
initial mass function from 0.1 — 125Mq and a constant 
Af* > lOOMyr. The relationship between halo mass, M», 
and SFR M*i at some particular mass, is found from the 
relation n^(> Mi) = n g (> M*i). Here nu(> M) is the 
number density of dark matter halos above mass M in 
our simulation and n g (> M*) is the number density of 
galaxies implied by the UV luminosity function above 
the SFR value, M*. This procedure is carried out in a 
number of different redshift bins which cover our entire 
light cone. As a simple correction for attenuation due 
to dust we increase the SFR of all halos in each redshift 
bin by a factor which sets the global SFR equal to that 
given in Ref. [13j (the blue solid curve in Figure 10 of 
Ref. |l3(). In the highest redshift bin we do not apply 
any dust correction. The particular parameters used for 
the abundance matching procedure are listed in Table [TT1 
Finally, we add detector noise and bright astronom- 
ical foreground and background emission. For the ex- 
amples below we include both the CMB and emission 
from dust in our galaxy. The dust emission is treated as 
a black body with a v 2 emissivity scaled to match the 
background radiation measured by COBE FIRAS in the 
faintest area on the sky [25| . In Figure [5J we illustrate 
the different components which make up our data sets. 



TABLE I: Ratio between line luminosity, L, and star forma- 
tion rate, M„, for various lines. For the first 7 lines this ratio 
is measured from a sample of low redshift galaxies. The other 
lines have been calibrated based on the galaxy M82. 



Species Emission Wavelength[/im] K[Lq / (Mq /yr)] 



CII 


158 


6.0 x 10 6 


01 


145 


3.3 x 10 5 


Nil 


122 


7.9 x 10 5 


OIII 


88 


2.3 x 10 6 


01 


63 


3.8 x 10 6 


NIII 


57 


2.4 x 10 6 


OIII 


52 


3.0 x 10 6 


12 CO(1-0) 


2610 


3.7 x 10 3 


12 CO(2-l) 


1300 


2.8 x 10 4 


12 CO(3-2) 


866 


7.0 x 10 4 


12 CO(4-3) 


651 


9.7 x 10 4 


12 CO(5-4) 


521 


9.6 x 10 4 


12 CO(6-5) 


434 


9.5 x 10 4 


12 CO(7-6) 


372 


8.9 x 10 4 


12 CO(8-7) 


325 


7.7 x 10 4 


12 CO(9-8) 


289 


6.9 x 10 4 


12 CO(10-9) 


260 


5.3 x 10 4 


12 CO(11-10) 


237 


3.8 x 10 4 


12 CO(12-ll) 


217 


2.6 x 10 4 


12 CO(13-12) 


200 


1.4 x 10 4 


CI 


610 


1.4 x 10 4 


CI 


371 


4.8 x 10 4 


Nil 


205 


2.5 x 10 5 


13 CO(5-4) 


544 


3900 


13 CO(7-6) 


389 


3200 


13 CO(8-7) 


340 


2700 


HCN(6-5) 


564 


2100 



TABLE II: Schechter function parameters for the UV Lumi- 
nosity Functions used to assign SFR to dark matter halos. 
These parameters are used through abundance matching. 



z 


<^*(xlO~ 3 Mpc- 3 ) 







Ref. 


0.0-0.5 


4.07 


-18.05 


-1.21 


[14] 


0.5-1.0 


3.0 


-19.17 


-1.52 


[15] 


1.0-1.5 


1.26 


-20.08 


-1.84 


[15] 


1.5-2.0 


2.3 


-20.17 


-1.60 


[15] 


2.0-2.7 


2.75 


-20.7 


-1.73 


[13] 


2.7-3.4 


1.71 


-20.97 


-1.73 


[13] 


3.4-4.5 


1.3 


-20.98 


-1.73 


[16] 


4.5-5.5 


1.0 


-20.64 


-1.66 


[16] 


5.5-6.5 


1.4 


-20.24 


-1.74 


[16] 


6.5-10.5 


0.86 


-20.14 


-2.01 


[iJ] 
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FIG. 2: Various components of the synthetic data set for a typical line of sight. We plot data from the SPICA example discussed 
below. The thick blue curve is the line emission from the target galaxies, the thin red dashed curve is the contribution from 
detector noise, and the thin black curve is the emission from all of the bad lines. We have not included the bright astrophysical 
foregrounds because they are orders of magnitude greater than all of the components plotted here. This emission along with 
galaxy continuum (not plotted) is removed in the fitting and subtraction step of measuring the power spectrum discussed in 
the text. 



C. Simulations 

To create our synthetic data we simulate the light cone 
of dark matter in a 100 x 100 arcmin 2 angular patch of the 
sky out to high redshift. We use a particle-multi-mesh 
N-body code to evolve the dark matter distribution (2(| . 
The simulation outputs are then stacked along the line 
of sight out to z = 10. 

For most of our light cone, we use an N-body simu- 
lation with 2048 3 dark matter particles on an effective 
mesh with 7680 3 cells in a comoving box with a length of 
200/i~ 1 Mpc on a side. This length is sufficient to cover 
the field of view out to the highest redshifts of interest. 
For low redshifts (z < 1), we use a second larger simula- 
tion to improve the sample variance of large halos. This 
simulation also contains 2048 3 dark matter particles and 
a mesh of 7680 3 cells, but has a length of 400/i~ 1 Mpc 
on each side of the box. In both simulations we iden- 
tify dark matter halos using a spherical overdensity algo- 
rithm. This is done by examining snapshots taken every 
20 Myr and 40 Myr in the 200/i _1 Mpc and 400ft _1 Mpc 
simulations respectively. 

The light cone is constructed from a series of red- 
shift zones, each zone spanning one comoving box length. 
Each zone is constructed from several redshift shells of 
thickness corresponding to a time interval of 20 or 40 
Myr depending on the box size. The shells are stacked 



in a continuous fashion, but the zones are randomized 
to eliminate any very long artificial structure. This pro- 
duces a discontinuity across zone boundaries. We are 
careful to only measure the power spectrum within one 
zone for our examples to avoid any problems associated 
with this discontinuity. 



D. Cross power spectrum measurement 

Measuring the cross power spectrum consists of three 
main steps: 

1. Fitting a smooth function of wavelength to each pixel 
and subtracting it away (note that we term each line of 
sight on the sky a "pixel" and each spectral component 
of the 3D data cube a "voxel"). 

2. Masking out voxels with bright bad line emission. 

3. Taking the product of the Fourier modes to estimate 
the power spectrum and then averaging in spherical shells 
in fc-space. 

We discuss these in turn. The fitting stage is neces- 
sary because we seek to measure only the signal com- 
ing from line emission and our data contains signal from 
both galaxy continuum emission as well as bright astro- 
physical foregrounds and backgrounds. Since these other 
sources vary slowly in the spectral direction we can re- 
move them by fitting a smooth function of wavelength 
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FIG. 3: The positions of dark matter halos from a slice of our simulated light cone projected onto the x-z plane (where z is the 
direction along the line of sight). The slice has thickness Ay = 0.5/i _1 Mpc. Each dark point represents a dark matter halo. 
Our light cone corresponds to 100 x 100 arcmin 2 on the sky. The details are explained in §2. 



to each pixel on the sky and subtracting it. This is the 
same procedure which has been discussed extensively in 
the context of cosmological measurements of 21cm radi- 
ation from neutral hydrogen [6l-fl0l|. 

More specifically, with our data sets we fit a polyno- 
mial in wavelength to the spectrum in each pixel and 
then subtract it away. This removes the foregrounds and 
galaxy continuum, as well as some large scale fluctua- 
tions in line emission along the line of sight. In order to 
minimize loss of the line signal we do not include vox- 
els in our fit which contain bright line emission. We do 
this by an iterative fit: we fit once to remove the fore- 
grounds and identify the bright voxels and then fit again 
excluding them. 

There will necessarily be some signal lost on large 
scales as a result of the fitting and subtraction stage. 
Fortunately as discussed in Ref. @, if we decompose 
our signal into Fourier modes, the lost signal is only 
from small fc-modes (corresponding to long wavelengths) 
along the line of sight. If we exclude these corrupted fc- 
modes in step 3 of measuring the cross power spectrum, 



we still have an unbiased estimation of the cross power 
spectrum without subtraction losses. Note that throw- 
ing away the low fc-modes does have a price. Since there 
are fewer statistical samples of modes this procedure in- 
creases the variance of power spectrum measurements on 
large scales. Because we wish to minimize the number of 
these corrupted modes, we fit with the lowest order poly- 
nomial which leaves no significant residual foregrounds. 

After we have subtracted away the foreground and con- 
tinuum emission it is necessary to remove voxels with 
very bright bad line emission. This is necessary because 
even though line emission from bright foreground galaxies 
does not bias our measurements of the power spectrum 
it does increase the error of our measurements due to the 
contribution in Eq. ([7]). 

The masking procedure must be done carefully in or- 
der to not introduce spurious correlations between the 
two data sets being cross correlated. For example, if one 
simply sets all voxels above some threshold signal equal 
to zero, a spurious change to the cross power spectrum is 
introduced (see Fig. [5]). This is because the location of 
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the brightest voxels (mainly due to contaminating bright 
foreground galaxies) are correlated with the distribution 
of target line emission. The signal from the bad lines and 
the target lines overlap so that bright bad lines which ap- 
pear in the data at locations of over-densities in the target 
lines are more likely to be above the removal threshold. 
Thus, the bad lines left after masking in one data cube 
will be anti-correlated with the target lines in the other 
cube. This causes the measured cross power spectrum to 
be lower than what would be measured from the target 
lines alone. 

In order to avoid this type of complication one can 
mask out voxels in a way which is uncorrelated with the 
target line emission being measured. This can be done 
by identifying individual bright sources instead of just re- 
moving the brightest voxels in the data. The voxels with 
bright contaminating lines can then be set to zero. These 
sources could be identified by looking at a series of differ- 
ent wavelengths and identifying them with multiple lines. 
Entirely different surveys could also be used to determine 
where contaminating lines from bright foreground galax- 
ies will appear and be removed. In our examples be- 
low, we assume that all of the galaxies which emit lines 
brighter than five times the RMS detector noise can be 
identified directly. When setting the masked voxels to 
zero we treat this as a change in the window function, 
W(r), which appears in Eq. ([5]). We normalize this new 
window function such that J W(r)d 3 r = 1. 

In the final step, we take the discrete Fourier transform 
of the two 3D data cubes being cross correlated. The 
estimation of the power spectrum at some particular k 
value is then given by the real part of the product of the 
survey volume, the Fourier mode of one data set, and 
the complex conjugate of the same Fourier mode in the 
other data set. This is equivalent to Eq. ([5]). Finally, we 
break fc-space into spherical shells with uniform thickness 
in log(fc). We then take the average estimated power 
spectrum of all the modes contained within each shell. 
As discussed above, we do not include low fc-modes along 
the line of sight which have been contaminated during 
the fitting and subtraction stage. Specifically, we do not 
include fc-modes which have a component along the line 
of sight smaller than, fc cut , the lowest value for which 
there is no significant contamination. In the examples 
below we find that for fc cut = 0.06/iMpc -1 there is no 
significant loss of power due to the foreground removal 
process. 



III. SPICA 

A. Instrument 

We consider two different examples of instruments and 
lines which could be used to measure the galaxy line cross 
power spectrum. In our first example, we envision an in- 
strument on the planned Space Infrared Telescope for 
Cosmology and Astrophysics (SPICA) [27| . SPICA is 



a 3.5 meter space-borne infrared telescope planned for 
launch in 2017. It will be cooled below 5K, providing 
measurements which are orders of magnitude more sen- 
sitive than those from current instruments. We consider 
an instrument based on the proposed high performance 
spectrometer /x-spec (H. Moseley, private communication 
2009). This instrument will provide background limited 
sensitivity with wavelength coverage from 250 — 700fim. 
A number of /i-spec units will be combined to record 
both spatial and spectral data in each pointing, which 
will be perfectly suited for intensity mapping. We as- 
sume that spectra for 100 diffraction limited beams can 
be measured simultaneously with a resolving power of 
R = {v/Av) = 1000. 



B. Results 

We use the simulation described above to create a syn- 
thetic data set and measure the cross power spectrum 
with the SPICA//x-spec instrument. We cross correlate 
01(63 jitm) and 0111(52 /zm) from galaxies at a redshift 
of z — 6. We assume the data covers a square on the 
sky which is 1.7 degrees across (corresponding to 250 
Mpc) and a redshift range of Az = 0.6 (corresponding 
280 Mpc). We assume a total integration time of 2 x 10 6 
seconds spread uniformly across this survey area. 

In Figure IH we show that using the procedure de- 
scribed above we can accurately measure the cross power 
spectrum. We show both the cross power spectrum of 
the emission from the target lines alone as well as that 
which is recovered when bad lines, detector noise, and 
foregrounds are included. The error in measuring the 
power spectrum is consistent with the analytical pre- 
diction derived in Ref. (Ij. The details introduced in 
our simulation and measurements, such as removing the 
foregrounds and masking out bright foreground galaxies, 
does not bias our estimate of the power spectrum or in- 
crease the uncertainty implied by Eq. 0. Other details 
of this example are presented in Table IIIII 

In Figure O we show the effects on the measured cross 
power spectrum of masking out all bright voxels. We 
have plotted the power spectrum from the target lines 
alone and also with the bad lines using the same mask 
in both cases. Clearly, the anti-correlation between the 
masked bad lines and the target lines in the other data 
set described above has biased the cross power spectrum 
measurement. 

We find that increasing the sky coverage (i.e. shorter 
integrations for each pointing on the sky, but larger sky 
coverage) increases our errors in the power spectrum. 
This is due to our assumptions about masking bright 
bad lines. As the survey becomes wider the detector 
noise goes up and the increased number of bright bad 
lines which arc not masked increases the errors on the 
power spectrum. One would not want to go much deeper 
over a smaller patch of sky than we consider, because 
we are already masking roughly 10% of each data cube. 



8 




0.5 1 
k[hMpc" 1 ] 

FIG. 4: The cross power spectrum of 01(63 /im) and 0111(52 (im) at z = 6 measured from simulated data for our hypothetical 
instrument modeled after SPICA. The blue curve is the cross power spectrum measured when only line emission from galaxies 
in the target lines is included. The green points are the recovered power spectrum when detector noise, bad line emission, 
galaxy continuum emission, and bright astrophysical foreground and background emission (i.e. dust in our galaxy and the 
CMB) are included. The error bars are the theoretical prediction of the root mean square error derived in [l(] and given by 
Eq. 10. In determining the error bars we have estimated Pi total and P2totai using our simulated data. These errors include 
detector noise, bad line emission and sample variance. 



Without using the increased sensitivity to remove more 
of the bright bad lines, going deeper and shallower would 
increase the noise in the power spectrum due to increased 
sample variance. 



IV. INTENSITY MAPPING CO(l-O) AND 
CO(2-l) 



If the mask were not dependent on the integration 
time (e.g. obtained from a different survey of foreground 
galaxies) it is straight forward to determine in a given 
time what the optimal sky coverage is for measuring 
power on a particular scale. Minimizing Eq. ([7]) with 
respect to time integrated per pointing, holding the total 
observation time fixed, one finds that the optimal cover- 
age Sets Pnoisel Pnoise2 = P\,2 + (-Pltotal — -fnoiscl ) (^2total — 

Pnoise2)- The product of the detector noise power spectra 
equals the sum of the sample variance contribution to the 
power spectrum uncertainty. 



As another example we consider intensity mapping 
the cross correlations between CO(l-O) and CO(2-l) at 
high redshifts with a dedicated instrument currently be- 
ing planned (J. Bowman 2011, private communication). 
Other similar instruments are currently being planned 
(G. Bower 2011, private communication). This observa- 
tion consists of two telescopes: a 20 meter dish and a 10 
meter dish to observe CO(l-O) and CO(2-l) respectively. 
Each of these telescopes can simultaneously observe 3 
deg 2 of the sky with angular resolution set by the beam 
size (3.5-5 arcmin at z = 7 — 10). We assume a spectral 
resolution of R = (u/Av) = 1000. While the actual in- 
strument will have a higher resolution this is sufficient to 
measure fluctuations on the scales we consider. To deter- 
mine the detector noise we use the radiometer equation 
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TABLE III: Summary of the results from the example of cross correlating OI(63 jum) and 0111(52 /im) with SPICA. The RMS 
detector noise is the value in each voxel. The bad line power to detector noise power ratio gives the relative contributions to the 
statistical error in the cross power spectrum due to the auto-correlations from all the bad lines and the detector noise which 
appear in Eq. (0. 



01(63 ^m) 0111(52 /im) 



Average Line Signals (Siine) 




20Jy/Sr 


14Jy/Sr 


Fraction of Voxels Masked 




0.097 


0.11 


RMS Detector Noise 




700Jy/Sr 


400Jy/Sr 


Brightness of CMB+Dust 




4MJy/Sr 


2MJy/Sr 


Bad line Power/Noise Power (k - 


= 0.3h -1 Mpc) 


6.5 


8.1 


Bad line Power/Noise Power (k - 


= l/i^Mpc) 


1.4 


1.7 


Cross Power S/N per fc-mode (k 


= 0.3h -1 Mpc) 


0.17 




Cross Power S/N per A;- mode (k 


= l/i^Mpc) 


0.14 





TABLE IV: Summary of the results from the example of cross correlating CO(l-O) and CO(2-l). The RMS detector noise is 
the value in each voxel. The bad line power to detector noise power ratio gives the relative contributions to the statistical error 
in the cross power spectrum due to the auto-correlations from all the bad lines and the detector noise which appear in Eq. © . 

CO(l-O) CO(2-l) 



Average Line Signals (Sn nc ) 0.1/iK 0.094^K 

Fraction of Voxels Masked 0.0 0.015 

RMS Detector Noise 1.0/iK 0.7^K 

Bad line Power/Noise Power (k = 0.1/i _1 Mpc) 0.0 7.0 

Bad line Power/Noise Power (k = 0.3/!,~ 1 Mpc) 0.0 1.5 

Bad line Power/Noise Power (k = 0.8/!, _1 Mpc) 0.0 0.5 
Cross Power S/N per fc-mode (k = 0.1/i _1 Mpc) 0.24 
Cross Power S/N per fc-mode (k = 0.3/i _1 Mpc) 0.12 
Cross Power S/N per fc-mode (k = 0.8/i _1 Mpc) 0.05 



where T sys is the system temperature which we have as- 
sumed to be 30K, t is the integration time which we have 
assumed is 3 x 10 7 s, and the factor of \/2 appears in the 
denominator because the intensity will be mapped from 
dual polarization. We create a synthetic data set for this 
instrument centered at z — 7.5 and the recovered cross 
power spectrum is shown in Figure [6] We summarize 
some other properties of this simulated measurement in 
Table [TV] 



V. DISCUSSION AND CONCLUSIONS 

By cross correlating emission in different spectral lines 
from the same galaxies, it is possible to measure their 
clustering. This clustering, quantified by the line cross 
power spectrum, can be measured for galaxies which are 
too faint to detect individually, but which can be ob- 
served in aggregate due to their large numbers [l|. 

In this paper, we have shown that the line cross power 
spectrum can be accurately measured with future instru- 



ments, based on synthetic data created using cosmologi- 
cal dark matter simulations. We produced our synthetic 
data by associating dark matter halos with galaxies and 
assigning each a spectrum. The continuum was gener- 
ated by scaling that of M82 with the SFR in each halo 
and line emission was set by calibrating with lower red- 
shift galaxies. The SFR was computed for halos with an 
abundance matching technique calibrated to observations 
of galaxy UV luminosity functions. Our synthetic data 
also included detector noise and bright emission due to 
astrophysical foregrounds and backgrounds such as that 
from dust in our galaxy and the CMB. Even if our sim- 
ple prescription deviates somewhat from reality, it still 
illustrates our main point, that whatever the underly- 
ing power spectrum of emission from galaxies is, it can 
be measured with the accuracy predicted analytically by 
Eq. ([7]) . It is reassuring that the complications addressed 
in our simulations such as removal of bright astrophysi- 
cal foregrounds and masking out bright bad line emission 
do not hinder measurement of the power spectrum com- 
pared to the analytic expectations from Ref. [l|. 

Measuring the line cross power spectrum consists of 
three main steps. First, a smooth function such as a 
polynomial is fit to each pixel on the sky and subtracted 
from the data to remove smooth foregrounds and the con- 
tinuum emission from galaxies. Next, bright voxels are 
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FIG. 5: The measured power spectrum when masking is done by simply setting the fluctuations in all voxels with signal greater 
than five times the RMS detector noise to zero. We use the same instrumental and survey parameters as in Figure [4] except 
we set the detector noise in each data cube to zero to more clearly demonstrate the masking effect. The points plotted are the 
measurements of the cross power spectrum after all bright voxels have been masked and set to zero. The error bars show the 
standard deviation in the cross power spectrum of all the modes sampled at each fc-value. The line is the power spectrum if 
only the target galaxy lines are included (but using the same mask). Clearly the anti-correlation between the masked bad lines 
and the target lines in opposite cubes produces a systematic shift in the power spectrum as described in the text. 



masked out. One must be careful in the masking tech- 
nique as it is possible to introduce spurious correlations 
if the masks are correlated with the target lines which 
appear in both data cubes. This can be avoided if bright 
sources are found individually at high significance and 
the corresponding voxels with bright contaminating lines 
are set to zero. Finally, the data is Fourier transformed 
and then the power spectrum is averaged in spherical 
shells. Modes corresponding to long wavelengths along 
the line of sight are not included, because they are con- 
taminated during the fitting and subtraction step. 

We find that the line cross power spectrum can be mea- 
sured with the accuracy predicted analytically by Eq. (JTJ) , 
derived in Ref. [l|. In particular, we tested two hypo- 
thetical instruments, an instrument mounted on SPICA 
and a pair of large ground based telescopes designed to 
measure the emission of CO(l-O) and CO(2-l). Though 
not included in our examples, it would also be valuable 
to measure emission from more than two different lines 
from the same redshifts. This could improve statistics 



and allow determination of the ratio of line emission in 
different lines by taking the ratio of different cross power 
spectra. 



Our results suggest that cross correlating galaxy line 
emission is a promising technique for studying high red- 
shift galaxies. It will enable one to measure the evolution 
of the total line signal from all galaxies at a particular 
redshift, even those that are too faint to be resolved in- 
dividually. This could reveal details about the evolution 
of galaxies' properties such as SFR density or average 
metallicity. It may also be possible to use these obser- 
vations to study the history of cosmic reionization, both 
by estimating the ionizing flux from faint galaxies and by 
looking for a sharp change in signal versus redshift due 
to the change in the minimum mass of halos which host 
galaxies. 



11 



10" 



CM 
C\J 



10" 



Q_ 

co 



Lj I i i i i i i i i I : 

0.1 1 

k[hMpc" 1 ] 

FIG. 6: The cross power spectrum of CO(l-O) and CO(2-l) from a central redshift of z = 7.5 measured with the telescope 
described in §4. An integration of 3 x 10 7 s and a redshift range of Az = 0.9 are assumed. The solid blue line is the power 
spectrum of the CO line emission alone measured from our simulated data. The green points are the measurements of the 
power spectrum recovered when the full simulated data set is used. This includes detector noise and bad line emission from 
the other lines in Table [I] The error bars are calculated from Eq. (0, where we have estimated Pitotai and P2totai using our 
simulated data. These errors include detector noise, bad line emission and sample variance. 
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